Clustering Item Data Sets with Association-Taxonomy Similarity

نویسندگان

  • Ching-Huang Yun
  • Kun-Ta Chuang
  • Ming-Syan Chen
چکیده

We explore in this paper the efficient clustering of item data. Different from those of the traditional data, the features of item data are known to be of high dimensionality and sparsity. In view of the features of item data, we devise in this paper a novel measurement, called the associationtaxonomy similarity, and utilize this measurement to perform the clustering. With this association-taxonomy similarity measurement, we develop an efficient clustering algorithm, called algorithm AT (standing for AssociationTaxonomy), for item data. Two validation indexes based on association and taxonomy properties are also devised to assess the quality of clustering for item data. As validated by the real dataset, it is shown by our experimental results that algorithm AT devised in this paper significantly outperforms the prior works in the clustering quality as measured by the validation indexes, indicating the usefulness of association-taxonomy similarity in item data clustering.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generic Query-Based Model for Scalable Clustering

This paper presents a generic model for clustering that requires no direct knowledge of the nature or representation of the data. In lieu of such knowledge, the relevant-set clustering (RSC) model relies solely on the existence of an oracle that accepts a query in the form of a data item, and returns a ranked set of items relevant to the query. In principle, the role of the oracle could be play...

متن کامل

Clustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers

In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...

متن کامل

New distance and similarity measures for hesitant fuzzy soft sets

The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...

متن کامل

Clustering Method Study on High-Dimensional Trading Data

Existing clustering algorithms are not designed specially for the features of trading data s and most clustering analyses lack scalability for large-scale transactions. Therefore, a rapid and scalable clustering algorithm using little space is proposed by us, to effectively process high-dimensional trading data without setting parameters manually. The improved method introduces weighted coverag...

متن کامل

Fuzzy Clustering improves Phylogenetic Relationships Reconstruction from Metabolic Pathways

The interest in reconstructing phylogenetic relationships from data on structural similarity of metabolic pathways is growing. The similarity notions and the techniques involved in this reconstruction are assessed by building phylogenetic relationships for model sets of organisms from the similarity measures of the same metabolic pathway for all of them, and then the phylogenetic trees obtained...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003